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We study the rate of decay of the probability of error for dis¬ 
tinguishing between a sparse signal with noise, modeled as a sparse 
mixture, from pure noise. This problem has many applications in sig¬ 
nal processing, evolutionary biology, bioinformatics, astrophysics and 
feature selection for machine learning. We let the mixture probabil¬ 
ity tend to zero as the number of observations tends to infinity and 
derive oracle rates at which the error probability can be driven to 
zero for a general class of signal and noise distributions via the likeli¬ 
hood ratio test. In contrast to the problem of detection of non-sparse 
signals, we see the log-probability of error decays sublinearly rather 
than linearly and is characterized through the y^-divergence rather 
than the Kullback-Leibler divergence for “weak” signals and can be 
independent of divergence for “strong” signals. Our contribution is 
the first characterization of the rate of decay of the error probability 
for this problem for both the false alarm and miss probabilities. 


1. Introduction. We consider the problem of detecting a sparse signal 
in noise, modeled as a mixture, where the unknown sparsity level decreases 
as the number of samples collected increases. Of particular interest is the 
case where the unknown signal strength relative to the noise power is very 
small. This problem has many natural applications. In signal processing, ap¬ 
plications include detecting a signal in a multi-channel system [10, 16] and 
detecting covert communications [11]. In evolutionary biology, the problem 
manifests in the reconstruction of phylogenetic trees in the multi-species 
coalescent model [20]. In bioinformatics, the problem arises in the context 
of determining gene expression from gene ontology datasets [15]. In astro¬ 
physics, detection of sparse mixtures is used to compare models of the cosmic 
microwave background to observed data [5]. Also, statistics developed from 
the study of this problem have been applied in machine learning to anomaly 
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detection on graphs [22] and high-dimensional feature selection when useful 
features are rare and weak [12]. 

Prior work on detecting a sparse signal in noise has been primarily focused 
on Gaussian signal and noise models, with the goal of determining the trade¬ 
off in signal strength with sparsity required for detection with vanishing 
probability of error. In contrast, this work considers a fairly general class 
of signal and noise models. Moreover, in this general class of sparse signal 
and noise models, we provide the first analysis of the rate at which the false 
alarm (Type-I) and miss detection (Type-II) error probabilities vanish with 
sample size. We also provide simple to verify conditions for detectability, 
which are derived using simpler tools than previously used. In the problem 
of testing between n i.i.d. samples from two known distributions, it is well 
known that the rate at which the error probability decays is for some 
constant c > 0 bounded by the Kullback-Leibler divergence between the 
two distributions [6, 8]. In this work, we show for the problem of detecting a 
sparse signal in noise that the error probability for an oracle detector decays 
at a slower rate determined by the sparsity level and the x^-divergence 
between the signal and noise distributions, with different behaviors possible 
depending on the signal strength. In addition to determining the optimal 
trade-off between signal strength and sparsity for consistent detection, an 
important contribution in prior work has been the construction of adaptive 
(and, to some extent, distribution-free) tests that achieve the optimal trade¬ 
off without knowing the model parameters [1, 3, 4, II, 16, 17, 23]. We discuss 
prior work in more detail in Sec. 2.1. However, the adaptive tests that have 
been proposed in these papers are not amenable to an analysis of the rate at 
which the error probability goes to zero. We show that in a Gaussian signal 
and noise model that an adaptive test based on the sample maximum has 
miss detection probability that vanishes at the optimal rate when the sparse 
signal is sufficiently strong. 

2. Problem Setup. Let {fo,n(ic)}j be sequences of probability 

density functions (PDFs) for real valued random-variables. 

We consider the following sequence of composite hypothesis testing prob¬ 
lems with sample size n, called the (sparse) mixture detection problem: 

(2.1) Ho,n : Xi,...,Xn^ ^o,n{x) i.i.d. (null) 

(2.2) Hi,„ : Wi,..., ~ (1 - e„)fo,n(a;) -f enh,n{x) i.i.d. (alternative) 

where {fo,n(a^)} is known, {fi,n(a^)} is from some known family X of sequences 
of PDFs, and {cn} is a sequence of positive numbers such that Cn —)• 0. We 
will also assume ne^ —?• oo so that a typical realization of the alternative is 
distinguishable from the null. 
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Let Po,n, Pi,n denote the probability measure under Ho,n, respectively, 
and let Eo,n) ^i,n be the corresponding expectations, with respect to the 
particular {fo,n(T)}, {fi,n(T)} and {e^}. When convenient, we will drop the 
subscript n. Let U - When fo,n(a^) = fo(T) and fi,n(a:) = fo(T - Mn), 

we say that the model is a location model. For the purposes of presentation, 
we will assume that {/in} is a positive and monotone sequence. When fo(x) 
is a standard normal PDF, we call the location model a Gaussian location 
model. The distributions of the alternative in a location model are described 
by the set of sequences {(en,/in)}- 

The location model can be considered as one where the null corresponds 
to pure noise, while the alternative corresponds to a sparse signal (con¬ 
trolled by Cn), with signal strength /i„ contaminated by additive noise. The 
relationship between Cn and fin determines the signal-to-noise ratio (SNR), 
and characterizes when the hypotheses can be distinguished with vanishing 
probability of error. In the general case, fo,n(a^) can be thought of as the 
noise and h,n{x) as the signal distribution. 

We define the probability of false alarm for a hypothesis test 5n between 
Ho,n and Hpn as 

(2.3) PpAin) = Po,n[<5n = 1] 


and the probability of missed detection as 
(2.4) Pmd(’^) — Pl,n[5n = 0]. 


A sequence of hypothesis tests {(5„} is consistent if Pfa(?t-), Pmd(^) —)• 0 as 
n —)■ oo. We say we have a rate characterization for a sequence of consistent 
hypothesis tests if we can write 


(2.5) 


lim 

n^oo 


log PFA(ra) 

9o{n) 


-c, 


lim 

n^oo 


log Pmd(?t^) 

9i{n) 


-d, 


where gQ{n), gi{n) —)• oo as n —)■ oo and 0 < c, d < oo. The rate charac¬ 
terization describes decay of the error probabilities for large sample sizes. 
All logarithms are natural. For the problem of testing between i.i.d. samples 
from two fixed distributions, go{n) = gi{n) = n, and c, d are called the error 
exponents [6]. In the mixture detection problem, go^n) and gi{n) will be 
sublinear functions of n. 

The log-likelihood ratio between the corresponding probability measures 
of and Ho,n is 


n 

LLR(n) = ^ log {l-en + enU(Ai)). 

i=l 


( 2 . 6 ) 
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In order to perform an oracle rate characterization for the mixture detection 
problem, we consider the sequence of oracle likelihood ratio tests (LRTs) 
between Ho,n and (i.e. with en,^o,n,h,n known): 


(2.7) 


5niXi,...,Xn) 



LLR(n) > 0 
otherwise 


It is well known that (2.7) is optimal for testing between Ho,n and in the 
sense of minimizing *^fa("-)+Pmd(») ^ jg average probability of error 

when the null and alternative are assumed to be equally likely [18, 21]. It is 
valuable to analyze Pfa(?t-) and Pmd(?t-) separately since many applications 
incur different costs associated with false alarms and missed detections. 

Location Model: The detectable region for a location model is the set 
of sequences {{en,fJ^n)} such that a sequence of consistent oracle tests {(!„} 
exist. For convenience of analysis, we introduce the parameterization 

( 2 . 8 ) Cn = n~^ 

where (3 G (0,1) as necessary. Following the terminology of [1], when f3 G 
(0, |), the mixture is said to be a “dense mixture”. If /? G ( 5 , 1), the mixture 
is said to be a “sparse mixture”. 


2.1. Related Work. Prior work on mixture detection has been focused 
primarily on the Gaussian location model. The main goals in these works 
have been to determine the detectable region and construct optimally adap¬ 
tive tests (i.e. those which are consistent independent of knowledge of {(e^, 
//„)}, whenever possible). The study of detection of mixtures where the 
mixture probability tends to zero was initiated by Ingster for the Gaussian 
location model [16]. Ingster characterized the detectable region, and showed 
that outside the detectable region the sum of the probabilities of false alarm 
and missed detection is bounded away from zero for any test. Since the 
generalized likelihood statistic tends to inhnity under the null, Ingster de¬ 
veloped an increasing sequence of simple hypothesis tests that are optimally 
adaptive. 

Donoho and Jin introduced the Higher Criticism test, which is optimally 
adaptive and is computationally efficient relative to Ingster’s sequence of hy¬ 
pothesis tests, and also discussed some extensions to Subbotin distributions 
and x^-distributions [11]. Cai et al. extended these results to the case where 
fo,n(a^) is standard normal and fi,n(ic) is a normal distribution with positive 
variance, derived limiting expressions for the distribution of LLR(n) under 
both hypotheses, and showed that the Higher Criticism test is optimally 
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adaptive in this case [3]. Jager and Wellner proposed a family of tests based 
on ^-divergences and showed that they attain the full detectable region in 
the Gaussian location model [17]. Arias-Castro and Wang studied a loca¬ 
tion model where fo,n(3;) is some fixed but unknown symmetric distribution, 
and constructed an optimally adaptive test that relies only on the symme¬ 
try of the distribution when /i„ > 0 [1]. In a separate paper, Arias-Castro 
and Wang also considered mixtures of Poisson distributions and showed the 
problem had similar detectability behavior to the Gaussian location model 
[ 2 ], 

Cai and Wu gave an information-theoretic characterization of the de¬ 
tectable region via an analysis of the sharp asymptotics of the Hellinger dis¬ 
tance for a wide variety of distributions, and established a strong converse 
result showing that reliable detection is impossible outside the detectable re¬ 
gion in many cases if (2.7) is not consistent [4]. This work also gave general 
conditions for the Higher Criticism test to be consistent. Our work com¬ 
plements [4] by providing conditions for consistency (as well as asymptotic 
estimates of error probabilities) for optimal tests, with simple to verify con¬ 
ditions for a fairly general class of models. While the Hellinger distance 
used in [4] provides bounds on Pfa(?T') + Pmd(?t-) for the test specified in 
(2.7), our analysis treats Pfa (^)5 Pmd(?T') separately as they may have dif¬ 
ferent rates at which they tend to zero and different acceptable tolerances 
in applications. As we will show in Sec. 3.2 and Sec. 4, there are cases where 
PFA(f^) Pmd(?t-) for adaptive tests and PFA(rr) ^ PMD(ri) for an oracle 
test. 

Walther numerically showed that while the popular Higher Criticism 
statistic is consistent, there exist optimally adaptive tests with significantly 
higher power for a given sample size at different sparsity levels [23]. Our 
work complements [23] by providing a benchmark to meaningfully compare 
the sample size and sparsity trade-offs of different tests with an oracle test. 
It should be noted that all of the work except [1, 3] has focused on the case 
where /3 > |, and no prior work has provided an analysis of the rate at 
which Pfa(^), Pmd(^) can be driven to zero with sample size. 

3. Main Results for Rate Analysis. 

3.1. General Case. Our main result is a characterization of the oracle 
rate via the test given in (2.7). The sufficient conditions required for the 
rate characterization are applicable to a broad range of parameters in the 
Gaussian location model (Sec. 3.2). 

We first look at the behavior of “weak signals”, where L„ has suitably 
controlled tails under the null hypothesis. In the Gaussian location model 
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in Sec. 3.2, this theorem is applicable to small detectable iXn- 


Theorem 3 . 1 . Let 70 G ( 0 , 1 ) and assume that for all 7 G (0,70) the 
following conditions are satisfied: 


(3.1) 

(3.2) 

(3.3) 

where 


lim El 


—D2—"(USi+i) 


^r).L)ri 


0 


y/nCnDn OO 


= 0 


(3.4) 


= Eo[(U-1)2] <00. 


Then for the test specified by (2.7), 

log PFA(n) 


(3.5) 


lim 


n—>-00 ne^ 


Moreover, (3.5) holds if we replace Pfa(?t-) with Pmd(?t-) 


The quantity is known as the y^-divergence between fo,n(ic) and 
[14]. In contrast to the problem of testing between i.i.d. samples from two 
fixed distributions [8], the rate is not characterized by the Kullback-Leibler 
divergence for the mixture detection problem. 


Proof. We provide a sketch of the proof for Pfa(r), and leave the details 
to Supplemental Material. We first establish that 


(3.6) 


lim sup 

n—^oo 


log PpAin) 

nelDl 


< - 


1 

8 


By the Chernoff bound applied to Pfa(?T') and noting Xi,..., Xn are i.i.d., 
PFA(n) = Po[LLR(n) > O] < T min Eo[(l - €„ + enU(-^i))^] ^ 

\ 0 <s<l y 


(3.7) 


< 


■\/1 ~ Cn + enLn(^l) ^ 


By direct computation, we see Eo[Ln(Xi) —1] = 0, and the following sequence 
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of inequalities hold: 


■\/l ~ Cn + 


= 1 - :rEn 


eliUX,) - 1 )^ 


_ (l + + ^n{\-n{Xl) — 1 )) 


< 1 - —Eo 
2 


< 1 - 


(U(Xi) - 1 )^ 

(1 + Vl + e„(U(Xi)-l))2 


e'^D^ 


= 1 - 


2(i + 7rT^)2 
2(i + 7rT^)2 


(U(Xi) - 1 ) 


l-En 


^2 l{U(Ai)<i+^} 

-^n 

iUXi) - 1)2 


1)2 

■^n 


-l{u(Xi)>i+^} 


Since the expectation in the previous line tends to zero by (3.1), for suffi¬ 
ciently large n it will become smaller than 7 . Therefore we have by (3.7) 


log PFA(ra) 


n 


< log 1 - 


1 


^n-^n 


2 (1 + ^/rT^)2 


(1-7) 


Dividing both sides by and taking the limsup using (3.2),(3.3) estab¬ 
lishes lim sup„_,.o (3 < “5 Since 7 can be arbitrarily small, 

(3.6) is established. 

We now establish that 


(3.8) 


lim inf 

n^oo 


log Pfa(?^) 
ntiDl 



The proof of (3.8) is similar to that of Cramer’s theorem (Theorem 1.4, [9]). 
The key difference from Cramer’s theorem is that LLR(n) is the sum of 
i.i.d. random variables for each n, but the distributions of the summands 
defining LLR(n) in (2.6) change for each n under either hypothesis. Thus, we 
modify the proof of Cramer’s theorem by introducing a n-dependent tilted 
distribution, and replacing the standard central limit theorem (CLT) with 
the Lindeberg-Feller CLT for triangular arrays (Theorem 3.4.5, [13]). 

We introduce the tilted distribution ^n{x) corresponding to fornix) by 


(3.9) 


(1 T fnLn( 2 :)) 


where A„(s) = Eo[(l — en + e,iLn(Xi))*], which is convex with A„(0) = 
A„(l) = 1, and Sn = argming<;^<;]^ A„(s). Let P, E denote the tilted mea¬ 
sure and expectation, respectively (where we suppress the n for clarity). A 
standard dominated convergence argument (Lemma 2.2.5, [ 8 ]) shows that 

(3.10) E[log (1 - Cn-P enU(Xi))] = 0. 
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Define the variance of the log-likelihood ratio for one sample as 


(3.11) 


ct^ = E (log(l + e„(U(Xi)-l)))^ 


For sufficiently large n such that Lemma 7.1 (proved in Supplementary Ma¬ 
terial) holds, namely that > C 2 e^D'^, we have: 

PFA(ra) = Pq [LLR(n) > 0] = Eo [l{LLR(n)>0}] 

= (An(Sn))’^E e ^^^^"'^l{LLR(n)>0} 


(3.12) 

(3.13) 

(3.14) 


= (An(sn))” E fe-LL^(’")|LLR(n) > oj P [LLR(n) > 0] 
= (A„(sn))”e LLLR(n)>o] P[LLR(n) > 0] 

E[|LLR(»)|] _ 

> (A„(sn))” e p[llrw> 0 ] p [LLR(n) > 0] 

.^E[{LLR(n))2] 

> (A„(sn))”e LLLR(n)>o] p [LLR(n) > 0] 


(3.15) 


= (A„(sn))” e p[llr(~)> 0 ] p [LLR(n) > 0] 


V nCie 


> (An(sn)r e p[llrw> 0 ] p [LLR(n) > 0] 


where (3.12) follows from Jensen’s inequality, (3.13) by LLR(n)l|LLR(„)>o} 
< |LLR(n)|, (3.14) by Jensen’s inequality, and (3.15) by Lemma 7.1 proved 
in the Supplementary Material. 

Taking logarithms and dividing through by gives 

log PFA(n) ^ log Anjsn) _ VCl _ 1 log P [LLR(n) > 0] 

nelDl - elDl p [LLR(n) > 0] V^enDn ne^D^ 

Taking liminf and applying Lemma 7.2, in which it is established that 
P[LLR(n) > 0] —and Lemma 7.4 in which it is established that 
liminf„_,.oo (see Supplementary Material), along with the 

^n-^n ° 

assumption ne^D'i —>■ oo establishes that liminf^^oo > “s- 

The analysis under for Pmd(^) relies on the fact that the Xi are 
i.i.d. with pdf (1 — Cn + erXn)^o,n{x), which allows the use of 1 — Cn J- erXn 
to change the measure from the alternative to the null. The upper bound is 
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established identically, by noting that the Chernoff bound furnishes 


MD 


{n) = Pi,n[ - LLR(re) > O] < Ei 


\/l £71 f^nLn(^l) 


= E, 


Vl ~ Cn + (-n\-n 


(^ 1 ) 


Similarly, the previous analysis can be applied to show that (3.8) holds with 
Pfa(^) replaced with Pmd(^)- D 


In order to study the behavior of tests when Thm 3.1 does not hold, we 
rely on the following bounds for Pmd(?^)) Pfa(?^): 

Theorem 3.2. (a) Let {<5^} be any sequence of tests such that 

limsup Pfa(?t-) < I 5 

n^oo 


then, 

(3.16) limmfl°gPMD(")>-l. 

n-l-oo nCn 

(b) The following upper and lower bounds for Pfa(?t-) hold for the test spec¬ 
ified by (2.7): 


(3.17) 

(3.18) 


Pfa(r) < 1 - (Po[Ln < 1])" 

n 


PFA(?^) > Po 


^ log max (1 - en,e„U(Xj)) 

_i=l 


> 0 


These bounds are easily proved by noting if all observations under 
come from fg^n, then a miss detection occurs (a), and at least one sample 
must have L„ > 1 in order to raise a false alarm (b). 

Note that these are universal bounds in the sense that they impose no 
conditions on fi,n(x),fo,n{x) and Cn- Also note that the bound of Thm 3.2(a) 
is independent of any divergences between fo,n(a^) and fi^ri(x), and it holds 
for any consistent sequence of tests because Pfa(?^) —^ 0. This is in contrast 
to the problem of testing between i.i.d. samples from hxed distributions, 
where the rate is a function of divergence [8]. 

When the conditions of Thm 3.1 do not hold, we have the following rate 
characterization for “strong signals”, where L„ is under the fi,n(T) distribu¬ 
tion in an appropriate sense. In the Gaussian location model in Sec. 3.2, this 
theorem is applicable to large detectable pin- 
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Theorem 3.3. Let Mq > 1, and assume that for all M > Mo, the 
following condition is satisified: 


(3.19) 




Then for the test specified by (2.7), 


(3.20) 

(3.21) 


lim sup 


log PpAin) 


lim 

n^oo 


ncn 

log PMD(ra) 
ne„. 


■ < -1 

= - 1 . 


Proof. We first prove (3.20). Let 


(f{x) = 1 + sx — (1 + x)^. 


By Taylor’s theorem, we see for s G (0,1) and x > — 1 that ^(x) > 0. Since 
Eo[U - 1] = 0, 

Eo[(l - en + enU(Xi))^] = 1 - Eo[(/)(€„(U(Xi) - 1))]. 


Note this implies Eo[(/>(en(U(-^i) - 1))] G [0,1] since Eo[(l -Cn+ enU(Xi))^] 
is convex in s and is 1 for s = 0,1. As in the proof of Thm 3.1, by the Chernoff 
bound, 

PFA(n) < (Eo[(l - + e„U(Xi))^])” 

for any s G (0,1). Thus, supressing the dependence on Xi, and assuming 
M > Mo, we have 


log Pfa(r) 
n 


< log Eo [(1 


Cn + enLn(^l))*] 


(3.22) 

(3.23) 


(3.24) 


(3.25) 


= log(l - Eo [(()(en(U - 1))]) 

< -Eo [</>(en(U - 1))] 

^ “Eo [(l){en{\-n “ l))l{e„(L„-l)>M}] 

= -Eo [(1 + Sen{\-n - 1) - (1 + en(L„ - 1))^) l{e„(L„-l)>M}] 

< —Eo [(sen(Ln — 1) — (1 + Cni^n “ 1))*) 

— —pQ [(■Sen,(L„ — 1) — 2 e„(L„ — 1) ) 


= -Eo 




< -Eo 


*(U 



(e„(U-l))i -0 

]\/[l — S^ ^{e7i(Lri — 
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= -en s - 


< -en I s - 


= -en S - 


Ml-* 

2 

Ml-* 

2 

Ml-* 


L„ ( 1 - 1 1 


I / ^{^n(Ln — 
1 


1 - 


“ 1 + M j 1K(U-1)>M} 
1 \ 

Eo [Lnl{e„(U-l)>M}] 


1 + M 


where (3.22) follows from log(l — x) < —x for x < 1, (3.23) follows from 
</>(t) > 0, (3.24) follows from (1 + x)^ < 2*x^ for x > 1 and taking M > Mq, 
(3.25) follows from s £ (0,1). Dividing both sides of the inequality by e„ 
and taking a lim sup^^g^ establishes 


lim sup 

n^oo 


log PpA 

nen 


< -s + 


2 

Ml-*’ 


Letting M —>■ oo and optimizing over s G (0,1) establishes the (3.20). By a 
change of measure between the alternative and null hypotheses, we see that 
(3.20) also holds with Pfa(^) replaced with Pmd(^^)- Combining this with 
Thm 3.2 establishes (3.21). □ 


Theorem 3.3 shows that the rate of miss detection is controlled by the av¬ 
erage number of observations drawn from fi,n(aj) under Hi^„, independent of 
any divergence between fi,n(a^) and fo,n(T) when (3.19) holds. Interestingly, 
so long as the condition of Thm 3.3 holds, by Thm 3.2(a), no non-trivial 
sequence of tests (i.e. limsup„_^oo Pfa(?T')) Pmd(?^) < 1 ) can achieve a better 
rate than (2.7) under Hi^„. This is different from the case of testing i.i.d. 
observations from two fixed distributions, where allowing for a slower rate 
of decay for Pfa(^) can allow for a faster rate of decay for Pmd(?T') (Sec. 3.4, 
[ 8 ]). 

In Sec. 3.2, we will show that Thm 3.3 is not always tight under Ho,n) 
and the true behavior can depend on divergence between fo,n(T) and fi^n(x), 
using the upper and lower bounds of Thm 3.2(b). 

3.1.1. Comparison to Related Work. Cai and Wu [4] consider a model 
which is essentially as general as ours, and characterize the detection bound¬ 
ary for many cases of interest, but do not perform a rate analysis. Note that 
our rate characterization (3.5) depends on the y^-divergence between 
fo,n and fi^n- While the Bellinger distance used in [4] can be upper bounded 
in terms of the y^-divergence, a corresponding lower bound does not exist in 
general [14], and so our results cannot be derived using the methods of [4]. In 
fact, our results complement [4] in giving precise bounds on the error decay 
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for this problem once the detectable region boundary has been established. 
Furthermore, as we will show in Thm 4.1, there are cases where the rates 
derived by analyzing the likelihood ratio test are essentially achievable. 

3.2. Gaussian Location Model. In this section, we specialize Thm 3.1 
and 3.3 to the Gaussian location model. The rate characterization proved is 
summarized in Fig. 1. We first recall some results from the literature for the 
detectable region for this model. 

Theorem 3.4. The boundary of the detectable region (in {(€„, gin)} space) 
is given by (with = n~^): 

1. If Q < j3 < \, then ficrit,n = (Dense) 

2. If ^ < P < ^, then gLcrit,n = ^2(/3 — logn. (Moderately Sparse) 

3. // I < /3 < 1, then p,crit,n = \/2(l - y/l - /3)2logn. (Very Sparse) 

If in the dense case pLn = 'rd', then the LRT (2.7) is consistent if r > fl — 



Fig 1: Detectable regions for the Gaussian location model. Unshaded regions 
have Pmd(?1') + Pfa(?T') —^ 1 for any test (i.e. reliable detection is impossi¬ 
ble). Green regions are where corollaries 3.5 and 3.6 provide an exact rate 
characterization. The red region is where Thm 3.8 provides an upper bound 
on the rate, but no lower bound. The blue region is where Gor. 3.7 holds, 
and provides an upper bound on the rate for PfaC^t-) and an exact rate 
characterization for PMD(ra)- 
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Moreover, if r < /3 — then Pfa(?^) + PmdC?^) —^ 1 for any sequence of tests 
as n ^ oo. If in the sparse cases, Hn = \/2rlogn, then the LRT is consistent 
if Hn > hcrit,n- Moreover, if Hn < hcrit,n, then PpAin) + PMD(n.) 1 for 
any sequence of tests as n —>■ oo. 


Proof. For the proof see [1, 3, 11]. 


□ 


We call the set of {(e,i, Hn)} sequences where (2.7) is consistent the interior 
of the detectable region. We now begin proving a rate characterization for 
the Gaussian location model by specializing Thm 3.1. Note that L„(x) = 
and = e^” — 1 . A simple computation shows that the conditions 
in the theorem can be re-written as: 

For all 7 > 0 sufficiently small: 


(3.26) 

Q (-i/in + ^log (i + + it (i + ^)) 

-2Q (-5/in + ^ log (l + j^)) +g(i//n + ^log(l + } ^0 

(3.27) (e^" - 1) ^ 0 

(3.28) ne^(e^" — 1) —)• oo 

where Q{x) = -^e~^x'^dx. 

' J X -yi 277 


Corollary 3.5. (Dense case) If Cn = n ^ for j3 G (0, i) and pin = 

n 2 ^ 

where h{n) —)• 00 and limsup„_^oc , < 1, then 

^§/ 31 ogn 


(3.29) 


logPFA(n) 1 

hm ———5 -- = —. 

n^oo ne)i{e>^n — 1) 8 


If hn —>• 0, (3.29) can be rewritten as 


(3.30) 


logPFA(R) 1 

— = “U- 
n^oo neifii 8 


This result holds when replacing PFA(ii) with PMD(ii)- 


Proof. It is easy to verify (3.27) and (3.28) directly, and (3.26) if ptn 
does not tend to zero. To verify (3.26) it suffices to show: If p,n —>• 0, for 

Q(afin-\-^— log(l+ —)) 

any a E M, then - ^ -^ 0. Since — 1 > x, it suffices to 

gMn — 1 
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show that 




0. This can be verified by the standard 


bound Q{x) < e 2 ^ for x > 0, and noting that a/i„ + ^ log(l + > 0 

for sufficiently large n and that —>■ 1 as x —>■ 0 . □ 


The implication of this corollary is that our rate characterization of the 
probabilities of error holds for a large portion of the detectable region up 
to the detection boundary, as h{n) can be taken such that —)• 0 for any 

^ > 0, making it negligible with respect to Hcrit,n in Thm 3.4. 


Corollary 3.6. (Moderately sparse case) If Cn = n ^ for (3 G (5)3) 
and fj,n = ^2(/3 + ^ + ^) logn for any 0 < ^ < then 


(3.31) 


logPFA(n) 1 

hm ———5 -- = 

n^oo ne^ — 1) 8 


and the same result holds replacing Pfa(r) with Pmd(r)- 


Proof. It is easy to verify (3.27) and (3.28) directly. To verify (3.26), 
note since Q(-) < 1 and pn ^ 00 , we need 


^{g(-i//n + ^iog(i + ^))- 

2 Q (-^Mn + ^ log (^1 + i)) + Q ^ log (^1 + I ^ 0 . 

Thus, it suffices to show that(5(— ^ log(l + ^)) —)■ 0, or equivalently, 

that —^hn + -jj- log(l + ^) —)■ 00 for any fixed 7 > 0. Applying log(l + ^) > 
log(j^) = /31ogn + logy shows that 


(3.31) - ^ log (1 + ^) > 


3 1 , , /31ogn + log 7 

- -a/2(^- 2 + 01ogn + —====== 

y2(^ -5+0 logn 


— ( “ i\/2(/3 - ^ + .^) + 


/? 


=)yiog 


n + 


logy 


y^ 2(/3- i T^logn 


where the last term tends to 0 with n. Thus, (3.31) tends to infinity if the 
coefficient of Vlogn is positive, i.e. if ^(1 — 2^) < (3 < |(3 — 6 ^) , which holds 
by the definition of Thus, (3.31) tends to infinity and (3.26) is proved. □ 
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Note that ^ can be replaced with an appropriately chosen sequence tend¬ 
ing to 0 such that (3.27) and (3.28) hold. For > Y^|/31ogn, (3.26) does 
not hold. However, Thm 3.3 and Thm 3.2 provide a partial rate characteri¬ 
zation for the case where grows faster than \/2/31ogn which we present 
in the following corollary. 


Corollary 3.7. 

then 

Ifen -n ^ for P £ (0,1) and hminf^^oo ^ 2 p\ogn > ^ 

(3.33) 

lim = 1. 

n->-oo neji 

If ^ — )• oo, then 

k"n 


(3.34) 

logPFA(R) , 

lim sup -= — 1 

n^oo 

Otherwise, if ^ — )• 0, then 

(3.35) 

logPFA(n) , 1 

lim sup 2 -o' 

n—^oo ^ 


Proof. The condition for Thm 3.3 given by (3.19) is 

^ (i log (1 + S) - 2/^-) ^ 1- 

This holds if ^ log(l -|- ^) — ^ —oo, which is true if r > /3. 

To show that liminf„_>.oo ^ —1 if qq we can apply 

TL€n 

a similar argument to the lower bound for Thm 3.1 to the lower bound 
given by (3.18) and is thus omitted. Instead, we show a short proof of 
liminf^^oo for C > 1 using (3.18). Note that we can loosen 

(3.18) to 


Pfa(r) > Po 


■ k 

X]log(l 

.i=l 


n 

^n) + ^ log (enU(Xi)) > 0 

i=k-\-l 


for any k and explicitly compute a lower bound to Pfa(r) in terms of the 
standard normal cumulative distribution function. Optimizing this bound 
over the choice of k establishes that liminf^^oo > — O for some 

constant C > 1 (with C = 1 if oo). The lower bounding of (3.18) 

in a manner similar to (3.1) recovers the correct constant when scales as 
y/2r log n. 
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To see that the log-false alarm probability scales faster than nen when 
^ —)• 0, one can apply (3.17). In this case, 

log PpAin) < log (l - (l - Qili^n)^) ■ 

Applying the standard approximation 


(3.36) 


_ It.2 

xe 


\/^(l -|- 

r logPpAC’T’) ^ 1 

we see limsup^^oo .(s < “s- 


< Q{x) < 


e 2 -^ 


X 




for X > 0, 


□ 


Note that (3.35) shows an asymmetry between the rates for the miss 
detection and false alarm probabilities, since there is a fundamental lower 
bound due to the sparsity under the alternative for the miss probability, but 
not under the null. 

Theorems 3.1 and 3.3 do not hold when Cn = and = \/2rlog~ri 
where r G (f, /3) for /3 G (0, |) or r G ((1 — -y/l “ /3)^j /3) for /3 G (f, 1). For 
the remainder of the detectable region, we have an upper bound on the rate 
derived specifically for the Gaussian location setting. One can think of this 
as a case of “moderate signals”. 


Theorem 3.8. Let en = n ^ and = v^2rlo^ where r G for 

13 G (0, f) orr G {{1 - y/1 - (3)'^, (3) for p G (f,l). Then, 


(3.37) 


lim sup 

n^oo 


logPFA(n) 




where ‘h(x) = 1 — Q{x) = denotes the standard normal 

cumulative distribution function. 

Moreover, (3.37) holds replacing Pfa with Pmd- 


Proof. The proof is based on a Chernoff bound with s = i. Details are 


given in the Supplemental Material. 


□ 


It is useful to note that ne^e^"<h((J; — |)//n) behaves on the order of 


.,l-2/3+2r-r(1.5-.S/2r) 


, - for large n in Thm 3.8. 

v2rlogn ^ 
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4. Rates and Adaptive Testing in the Ganssian Location Model. 

No adaptive tests prior to this work have had precise rate characterization. 
Moreover, optimally adaptive tests for 0 < /3 < 1 such as the Higher Criti¬ 
cism (HC) [11] test or the sign test of Arias-Castro and Wang (ACW) ([1], 
Sec. 1.4)^are not amenable to rate analysis based on current analysis tech¬ 
niques. This is due to the fact that the consistency proofs of these tests 
follow from constructing functions of order statistics that grow slowly under 
the null and slightly quicker under the alternative via a result of Darling 
and Erdos [7]. We therefore analyze the max test: 


(4.1) 


^max(Ai, . . . , 



maxj=i,„._„ Xi > Tn 
otherwise 


where Tn is a sequence of test thresholds. 

While the max test is not consistent everywhere (2.7) is [1, 11], it has a few 
advantages over other tests that are adaptive to all {{en, iXn)} possible (i.e. 
optimally adaptive). The first advantage is a practical perspective; the max 
test requires a linear search and trivial storage complexity to find the largest 
element in a sample, whereas computing the HC or ACW test requires on 
the order of n log n operations to compute the order statistics of a sample of 
size n (which may lead to non-trivial auxiliary storage requirements), along 
with computations depending on Q-functions or partial sums of the signs of 
the data. Moreover, the max test has been shown to work in applications 
such as astrophysics [5]. It does not require specifying the null distribution, 
which allows it to be applied to the Subbotin location models as in [1]. The 
second advantage is analytical, as the cumulative distribution function of the 
maximum of an i.i.d. sample of size n with cumulative distribution function 
F{x) has the simple form of F{x)^. This also provides a simple way to set 
the test threshold to meet a pre-specified false alarm probability for a given 
sample size n. As most applications focus on the regime where = n~^ 
for /3 > ^, the following theorem shows the max test provides a simple test 
with rate guarantees for almost the entire detectable region in this case. 


Theorem 4.1. For the max test given by (4.1) with threshold Tn = 
•y/2 logn; 

The rate under the null is given by 

lim = -I. 

n^oo log log n 2 

^We avoid the use of the acronym CUSUM since it is reserved for the most popular 
test for the quickest change detection problem in Sequential Analysis. 
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Under the alternative, i/liminfn->.oo 




■^2(l-v^T^)2logn 


> 1 with €n = n 


(4.3) 


lim 

n^oo 


log PMD(ra) 
nenQiVTk)^ - Hn) 


= - 1 . 


In particular, i/lim inf„_^oo ^ 2 \og n ^ achieves the optimal 

rate under the alternative 


(4.4) 


lim 

n^oo 


log PMD(ra) 

ncn 


= - 1 . 


Otherwise, the max test is not consistent. 


Proof. The error probabilities for the max test given by (4.1) with 
threshold 

(4.5) PFA(n) = 1 - HrnT 

(4.6) PMD(n) = ((1 - €n)^{Tn) + en4>(r„ - Hn)y 

follow from the cumulative distribution function of the maximum of an i.i.d. 
sample. The rates (4.2),(4.3),(4.4) as well as the condition for inconsistency 
are derived by applying the approximation (3.36) to (4.5) and (4.6). □ 



Fig 2: Detectable region of the Max test. White denotes where detection is 
impossible for any test. Black denotes where the max test is inconsistent. 
Green denotes where the max test is consistent, but has suboptimal rate 
under the alternative compared to (2.7). Blue denotes where the max test 
achieves the optimal rate under the alternative. Compare to Fig. la. 
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The results of Thru. 4.1 are summarized in Fig. 2. In particular, if we take 
IXn = log n with r G ((1 - y/l - /3)^, l), we see log Pmd {n) scales on the 

order of ■ This is suboptimal compared to the rates achieved by 

the (non-adaptive) likelihood ratio test (2.7), but is of polynomial order (up 
to a sub-logarithmic factor). Note that the rate of decay of the sum error 
probability can be slower than that of the miss detection probability, since 
the false alarm probability is fixed by the choice of threshold, independent 
of the true {(enj/^n)} for adaptivity. 

5. Numerical Experiments. In this section, we provide numerical 
simulations to verify the rate characterization developed for the Gaussian lo¬ 
cation model as well as some results comparing the performance of adaptive 
tests. 

5.1. Rates for the Likelihood Ratio Test. We first consider the dense case, 
with en = n~^'^ and fin = ^- The conditions of Cor. 3.5 apply here, and we 
expect -— 1. Simulations were done using direct Monte Carlo 

ne2(e^^—1) ° 

simulation with 10000 trials for the errors for n < 10®. Importance sampling 
via the hypothesis alternate to the true hypothesis (i.e. Ho,n for simulating 



(a) Simulations of error probabilities in 
the Gaussian location model with = 
l,e„ = for the test (2.7). A best 
fit line for log Pmd (u) is given as a blue 
dashed line and corresponding line for 
logPFA(u) is given as a red dot-dashed 
line. 


(b) Simulations of error probabilities in 
the Gaussian location model with /i„ = 
a/ 2(0.19) logn, e„ = for the test 

(2.7). A best fit line for log Pmd( u) is 
given as a blue dashed line and corre¬ 
sponding line for logPFA(u) is given as 
a red dot-dashed line. 


Fig 3: Simulation results for Cor. 3.5 and 3.6 
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Pmd(?^), for simulations Pfa(?t-)) was used for 10® < n < 2 x 10^ with 
between 10000 — 15000 data points. The performance of the test given (2.7) 
is shown in Fig. 3a. The dashed lines are the best fit lines between the log- 
error probabilities and ne^(e^" — 1) using data for n > 350000. By Cor. 3.5, 
we expect the slope of the best fit lines to be approximately — |. This is the 
case, as the line corresponding to missed detection has slope —0.13 and the 
line corresponding to false alarm has slope —0.12. 

The moderately sparse case with and = 1/2(0.19) log n 

is shown in Fig. 3b. The conditions of Cor. 3.6 apply here, and we expect 
Simulations were performed identically to the dense case. 

ne^{e^n—l) ° 

The dashed lines are the best fit lines between the log-error probabilities 
and ne^(e^" — 1) using data for n > 100000. By Cor. 3.6, we expect the 
slope of the best fit lines to be approximately — |. Both best fit lines have 
slope of —0.11. It is important to note that Pfa(?T')) Pmd(?^) are both large 
even at n = 2 x 10^ and simulation to larger sample sizes should show better 
agreement with Cor. 3.6. 


5.2. Adaptive Testing. In order to implement an adaptive test, the thresh¬ 
old for the test statistic must be chosen in order to achieve a target false 
alarm probability. This can be done analytically for the max test by invert¬ 
ing (4.5). For other tests, which do not have tractable expressions for the 
false alarm probability, we set the threshold by simulating the test statistic 
under the null. The threshold is chosen such that the empirical fraction of 
exceedances of the threshold matches the desired false alarm. As expected, 
the adaptive tests cannot match the rate under the null with non-trivial be¬ 
havior under the alternative, and therefore we report the results for adaptive 
tests at the standard 0.05 and 0.10 levels. The miss detection probabilities 
reported for the max test were computed analytically via (4.6). Note that 
the likelihood ratio test (2.7) with threshold set to meet a given false alarm 
level is the oracle test which minimizes the miss detection probability [8]. 

As multiple definitions of the Higher Criticism test exist in literature, we 
use the following version from [3]: Given a sample Xi,, Xn, let pi = Q{Xi) 
for 1 < z < n. Let {p(j)} denote {pi} sorted in ascending order. Then, the 
higher criticism statistic is given by 

(5.1) HC* = max HC„i where HC„, = — . ^ y/n 

' ’ Aod-?>(.)) 

and the null hypothesis is rejected when HC* is large. The HC test is opti¬ 
mally adaptive, i.e. is consistent whenever (2.7) is. 
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The ACW test [1] is implemented as follows: Given the samples Xi,..., Xn, 
let A[j] denote the i-th largest sample by absolute value. Then, 


(5.2) 


S* = max 

l<fc<n 


\/fc 


and the null hypothesis is rejected when S* is large. The ACW test is adap¬ 
tive for /3 > It is unknown how the ACW test behaves for /3 < |. Note 
that like the Max test (and unlike the HC test), the ACW test does not ex¬ 
ploit exact knowledge of the null distribution (but assumes continuity and 
symmetry about zero). 

The performance of test (2.7) is summarized in Table 1 with a comparison 
of adaptive tests in the moderately sparse example from the previous section 
is given in Table 2. We used 115000 realizations of the null and alternative. 
The sample sizes illustrated were chosen to be comparable with applications 
of sparse mixture detection, such as the WMAP data in [5] which has n ~ 
7 X 10^. Thus, our simulations provide evidence for both larger and smaller 
sample sizes than used in practice. We see there is a large gap in performance 
between the likelihood ratio test (2.7) and the adaptive tests, but the Higher 
Criticism test performs significantly better than the Max or ACW tests. 


LRT 

n 

Pfa(u) 

Pmd(u) 

10 

0.307 

0.388 

10 ^ 

0.258 

0.320 

10 ® 

0.213 

0.256 

10 "^ 

0.166 

0.193 

10 ® 

0.119 

0.134 

10 ® 

0.074 

0.084 


Table 1 


Error probabilities for pn = .y/2(0.19) log n, e„ = n ° ® for the LRT given by (2.7). 




Pfa = 

= 0.05 



Pfa = 

= 0.10 


n 

LRT 

Max 

HC 

ACW 

LRT 

Max 

HC 

ACW 

10 

0.776 

0.845 

0.790 

0.807 

0.665 

0.744 

0.666 

0.706 

10® 

0.667 

0.814 

0.775 

0.816 

0.542 

0.704 

0.630 

0.722 

10® 

0.548 

0.789 

0.728 

0.792 

0.417 

0.672 

0.561 

0.653 

10^ 

0.403 

0.762 

0.688 

0.751 

0.281 

0.639 

0.491 

0.617 

10® 

0.252 

0.733 

0.623 

0.685 

0.158 

0.603 

0.396 

0.539 

10® 

0.119 

0.699 

0.546 

0.602 

0.064 

0.562 

0.295 

0.446 


Table 2 


Miss Detection probabilities for pn ~ ■\/2(0.19) logn, = n °'®, for False Alarm 

probability 0.05 and 0.10. 
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LRT 

n 

Pfa(u) 

Pmd(u) 

10 

1.62e-l 

2.75e-l 

10 ^ 

6.31e-2 

1 .12e-l 

10 ® 

7.63e-3 

1.36e-2 

lO'^ 

5.38e-5 

8.83e-5 


Table 3 

Error probabilities for pn = ■y/2(0.66) log n, e„ = ® for the LRT given by (2.7). 




Pfa = 

= 0.05 



Pfa = 

= 0.10 


n 

LRT 

Max 

HG 

AGW 

LRT 

Max 

HG 

AGW 

10 

4.66e-l 

5.66e-l 

7.18e-l 

5.88e-l 

3.59e-l 

4.36e-l 

3.38e-l 

5.88e-l 

10® 

1.28e-l 

2.56e-l 

6.24e-l 

4.80e-l 

8.45e-2 

1.61e-l 

1.07e-l 

4.80e-l 

10® 

3.69e-3 

4.40e-2 

2.48e-2 

1.33e-l 

1.89e-3 

1.80e-2 

4.20e-3 

1.33e-l 

lO"' 

2.12e-7 

8.08e-4 

< le-5 

4.43e-3 

7.10e-8 

1.32e-4 

< le-5 

1.25e-3 


Table 4 


Miss Detection probabilities for /i„ = ^2(0.66) logn, Cn = n ° ® for False Alarm 

probability 0.05 and 0.10. 


For the case of strong signals, we calibrate as ftn = ^2(0.66) logn for e„ = 
n-o ®. This corresponds to the rates given by Thm. 3.3. The performance of 
test (2.7) is summarized in Table 3 with a comparison of adaptive tests in 
the moderately sparse example from the previous section is given in Table 4. 
Here we used 180000 realizations of the null and alternative. As even the 
max test has error probabilities sufficiently small for many applications in 
this regime at moderate sample sizes (which are still on the order used in 
applications [5]), we only consider sample sizes up to n = 10^. We see that 
in the strong signal case, the likelihood ratio test performs better than the 
adaptive tests, but all tests produce sufficiently small error probabilities for 
most applications. 

6. Conclusions and Future Work. In this paper, we have presented 
an rate characterization for the error probability decay with sample size 
in a general mixture detection problem for the likelihood ratio test. In the 
Gaussian location model, we explicitly showed that the rate characterization 
holds for most of the detectable region. A partial rate characterization (an 
upper bound on the rate under both hypotheses and universal lower bound 
on the rate under Hi^^) was provided for the remainder of the detectable 
region. In contrast to usual large deviations results [6, 8] for the decay of 
error probabilities, our results show that the log-probability of error decays 
sublinearly with sample size. 

There are several possible extensions of this work. One is to provide cor¬ 
responding lower bounds for the rate in cases not covered by Thm 3.1. 
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Another is to provide a general analysis of the behavior that is not covered 
by Thm 3.1 and 3.3, present in Thm 3.8 in the Gaussian location model. 
As noted in [3], in some applications it is natural to require Pfa(^) < a 
for some fixed a > 0, rather than requiring Pfa(?T') 0. While Thm 3.4 
shows the detectable region is not enlarged under in the Gaussian location 
model (and similarly for some general models [4]), it is conceivable that the 
oracle optimal test which fixes Pfa(^) (he. one which compares LLR(n) to 
a non-zero threshold) can achieve a better rate for Pmd(?^)- It is expected 
that the techniques developed in this paper extend to the case where Pfa(?T') 
is constrained to a level a. In the Gaussian location model, the analysis of 
(2.7) constrained to level a problem has been studied in [16] via contiguity 
arguments. 

Finally, it is important to develop tests that are amenable to a rate anal¬ 
ysis and are computationally simple to implement over 0 < /? < 1. In the 
case of weak signals in the Gaussian location model, we see that the error 
probabilities for the likelihood ratio test, which establish the fundamental 
limit on error probabilities, decay quite slowly even with large sample sizes. 
In this case, closing the gap between the likelihood ratio test and adaptive 
tests is important for applications where it is desirable to have high power 
tests. In the case of strong signals, we see the miss detection probability for 
even the simplest adaptive test, the max test, are very small for moderate 
sample sizes at standard false alarm levels so the rate of decay is not as 
important as the weak signal case for applications. 

SUPPLEMENTARY MATERIAL 

Supplemental Material for “Detecting Sparse Mixtures: Rate of 
Decay of Error Probability” 

(doi: COMPLETED BY THE TYPESETTER; .pdf). We provide details of 
proofs of the main theorems. 
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V. VeeravalliII 
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7. Weak Signals: Supporting Lemmas. In this section, we provide 
the proofs of the lemmas that are necessary for establishing the validity of 
Theorem 3.1. 


Lemma 7.1. Under the assumptions of Theorem 3.1, there exist positive 
constants Ci,C 2 such that for sufficiently large n we have 

CielDl >al> C2elDl 

where cr^ is defined in (3.11). 


Proof. We first show that for sufficiently large n, 


(7.1) 


Cl > 


^ n^ri{,Sn) 
e2 


Note that 


(7.2) (log (1 + x)f (1 + xfi < 2x^ for s E (0,1), x > 1. 


This follows from 0 < log (1 + x) < ^/x for x > 0 and 1 < (1 + x)® < 2x for 
X > 1 and s E (0, 1). Also, note A„(0) = A„(l) = 1 implying E (0, 1) by 
convexity of A„ (Lemma 2.2.5, [8]). 

For shorthand, we will write L^, = (Xi). Then, 


Art(Sn)<Tjj — Eg 
= Eg 

(7.3) + Eg 


(log(l + Cn (L 
(log(l + Cn (L 
(log(l + (u 


1)))^ (1 + ^ri (Ln 

1)))^ (1 A ^ri (Ln 

1)))^ (1 A (Ln 


1))"" 


^Supported by the US National Science Foundation under grants GIF 1514245 and 
GIF 1513373. 
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We first consider Eo[(log(l + Cn (U - 1)))^ (1 + Cn (U - l))^"l{e„(u-i)>i}] • 
By (7.2), we have on the event {e„ (L„ — 1) > 1} that 

(log (1 + e„ (U - 1)) )'(1 + e„ (U -!))'"< 2(e„ (U - 1) )'■ 


Thus, 


(7.4) Eo [ (log (1 + en (L„ - 1)))^ (1 + e„ (L„ - l))*"l{e„(L„-i)>i}] 

<Eo[2(e„(U-l))2l|,„(L„_i)>i}] <2elEo[{ln-lf]=2elDl 


We now consider 


(log (1 + en (L„ - 1)))^ (1 + en {En - l))*”l{e„(U-l)<l} 


A simple calculus argument shows that (log(l + x))^ < 5x^ for x > — |. 
Note that since > 0, —Cn < en (L^ — 1). Because en —)■ 0, for sufficiently 
large n we have that en < ^ and (log (1 + (L„ — 1)))^ < 5 (e^ (L„ — 1))^ 

holds. Also, (1 + en (L„ — 1))^" < 2^" < 2 on the event {en (L^ — 1) < 1}. 
Thus, 


(7.5) Eo (log (1 + Cn (L„ - 1)))^ (1 + Cn (L„ - 1))^"^1{,;^(L^_1)<1} 


< Eo 


10 (e^ (L„ 1)) l{e„(L„-l)<l} 


< 10 E, 


ien{ln-l)y 


= 

±KJ 


Using (7.4),(7.5) in (7.3), we see for sufficiently large n that 


An(Sn)cr^ < 12e^i:»,2 


establishing (7.1). 

We now show that 
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Taking any 7 < from Equation (3.1) of Theorem 3.1, 


Kn{Sn)(Tn = Eq (log (1 + 6^ (U “ 1)))^ (1 + (-n (U “ 1))" 


(7.7) 

(7.8) 


(7.9) 


> Eo 


> Eo 


(log (1 + En (Ln - 1)))^ (1 + Cn (L„ - 1))^"l{e„(U-l)<7} 




T)2e2 


(log (1 + en (U - 1))) 1^2 J ^{cn(Ln-l)<7} 

(en (Ln — 1)) l{e„(U-l)<7} 


(en(Ln-l))' 

■^n 

(en(Ln-l))' 


1 


{e„(L„-l)<7} 


T)2 


l-En 


(Ln - 1 ) 

Dl 


{U<1+^} 

2 


-1 




Where (7.7) follows from (1 + en(\-n — 1))*" > (1 — Cn)^" > 1 — Cn > 5 for 
sufficiently large n, as Sn G (0,1) and Cn —)> 0. A simple calculus argument 
shows that < (log(l + x))^ for x G [—g]- This, along the fact that 
with —f < —Cn < en(Ln — 1) < 7 < | on the event {en(Ln — 1) < 7} for 
sufficiently large n establishes (7.8). The definition of furnishes (7.9). 
Noting that Eo[^^^ 2^1 {l„>i+^}] 0 by the assumptions of Thm 2.1 in 

the main text, (7.6) is established. 

In order to remove the An(sn) factor from the bounds, note that An(sn) < 
An (1) < 1 and that An (s) > (1 — enY > | for sufficiently large re. This along 
with (7.1) and (7.6) establishes the lemma. 

This lemma is established identically under Hi^n by applying a change of 
measure to Po,n (which replaces Sn with 1 — Sn in the argument above). □ 


Lemma 7.2. Under the assumptions of Theorem 3.1, if we use the tilted 
measure, we have 

(7.10) P[LLR(re) > 0] ^ ^ 

as re —)■ 00 . 


Proof. For the proof, we will need the Lindeberg-Feller Central Limit 
Theorem whose validity is demonstrated in Theorem 3.4.5, [13]: 
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Theorem 7.3. For each n, let Zn,i, 1 < i < n, be independent zero-mean 
random variables. Suppose 


(7.11) 

and for all 7 > 0 , 

(7.12) 


lim =a 2>0 

n—^00 ^ ^ 


2=1 




= 0 


2 = 1 


Then, Sn = Zn,i + ... + Zn,n converges in distribution to the normal 
distribution with mean zero and variance a'^ as n —)• 00 . 

Let us now continue with the proof of Lemma 7.2. We draw i.i.d. 
from Ho,n- Define for 1 < m < n 


(7.13) = log (1 + en(U(-Tj) - 1 )) , Zn,i = 

Note that 

LLR(n) 


nCTr, 


(7.14) 




2 = 1 


nan 


We show ^ri,i converges to a standard normal distribution under the 

tilted measure. As stated in the main text, E = 0 and 
Thus, (7.11) is satisfied with cr^ = 1. 

It remains to check (7.12). Since for fixed n, the Zn,i are i.i.d, it suffices 
to verify that 




Sn,l 

— 1 


0 , 


n —>• 00 . To simplify notation, let L„ = (^i)- By Lemma 7.1, it suffices 

to show that 


^n,l 


1 t 2 


0 


which changing to the Pq measure is equivalent to showing that for 0 < 7 < 
70 

^ t 2 


(7.15) 


^0 


elDl 


(l + e„(U-l))"-l ^2^ 


'^2^n^n 


0 
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since A„(sn) G [ 511 ] for sufficiently large n. 
We decompose (7.15) into 


(7.16) Eo 


?n,l 

e 2 D 2 
n 


{l + en{ln-l)T^l 


{- •y > 717 ^} 


= En 


?n,l 


^n-^n - 

Sn,l 


I c >n 72 ,e„(U-l)>l} 

^2^n-^n 


+ E, 


g 2 2)2 


(l + e„(U-l))*"l ^2 

{ g (2^2 >ri- 7 ^.en(U-l)<l} 


and show that both parts in (7.16) tend to zero. For the first part applying 
(7.2) and (log(l + x))"^ < x for x > 0, 


Sn,l 

{„ Ahs >>^7^i£n(Ln —1)>1} 


(l + e„(U-l))""l^ ^2_^ 

02 ^n- 0 n 


< Eo 


= 2En 


< 2 Eo 


< 2 Eo 


2 ei { K-ir 

(U-i)% 

(Ln- 1 )% 

-^n . 

(U - 1)^ 


m—l(U>l+i) 


where the last inequality holds because from y/riDnen —?• 00 we can conclude 
that for sufficiently large n we have y/C 2 y/nDnen > 1 - 

We now show that the second part in (7.16) tends to zero as well. We 
observe that since L„ > 0 we have —e„ < en (L„ — 1). Using (log(l + x))^ < 
5x^ for X > —and that (1 + en(L„ — 1 ))®" < 2^" < 2 on the event 
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{en(L„ — 1 ) < 1 }, we see that for n sufficiently large such that < 5 , 

Sn,l 


g 2 


(l + en(Ln 1)) ^ 

^~ C^L 

- 1)2 


{ c ~3’o 2 >?t7^,en(U-l)<l} 

'^2^71^71 


< lOEo 

< lOEo 
= lOEo 

< lOEo 


g 2 J)2 


1 ^ 

^2^71^71 


(U - 1)2 

(U-l )2 

(U-l )2 

Dl 


{I L„ - 1 1 > yi^v^D„7} 


{ L„ > 1 +7 } 


1 


{U>l+i} 


The last equality follows from the fact that y/rienDn —?• 00 , since this im¬ 
plies that ^/nDn 00 , which suggests that for large enough n we cannot 

have 1 — L„ > ^J~^^/nDn^ but only — 1 > ^J~^^/nDn'y■ Finally the 

last inequality is true for large enough n such that ^J~^^/nDnen > 1 , which 
is always possible since this quantity tends to infinity because of our as¬ 
sumption in (3.3). Thus, (7.15) holds and the Lindeberg-Feller CLT shows 
that converges to a standard normal distribution under the tilted 

Vncr„ 

measure. Therefore, 

■LLR(n) 


(7.17) 


P [LLR(n) > 0] = P 


\/ncrn 


> 0 


as n —>■ 00 establishing the lemma. 

Verifying the Lindeberg-Feller CLT conditions for analyzing Pmd is done 
by changing from the Pi to the Pq measure. □ 


Lemma 7.4. 
(7.18) 


Under the assumptions of Theorem 3.1 we have 


lim inf 

n—¥oo 


log AnjSn) 

g 2 J^2 


1 

> 

“ 8 


Proof. Consider the function (l-|-x)^ for s £ (0,1) and x G [— 7 , 7 ] 
where 0 < 7 < 1. Then 

/ xs 1 / X 2 1 (1 ~ s) (2 — s) o 

(1 + x) =l + sx+-s(s-l)x +- ^ 

1 1 7 

(7.19) > 1 -|- sx — -^2 — --= 1 -P sx — a;( 7 )x 2 , 

8 3(1 — ^) 
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where we define = | + The first equality holds for some 

^ G [— 7 , 7 ] by the mean value form of Taylor’s theorem. The inequality is ob¬ 
tained by minimizing the coefficient of while for the last term we observe 
that since x > —7 we have x^ > — 7 X^; furthermore (1 -|- > (1 — 7 )^ 

and (1 — s)(2 — s) < 2. When we substitute the previous inequalities we 
obtain the lower bound in (7.19). 

Using this, we can lower bound A„(s) for all s G (0,1). Fix 0 < 7 < 1. As 
before, we will use the shorthand = L„ (Xi). Then, for sufficiently large 
n we have < 7 suggesting that —7 < e„(L„ — 1). Therefore using (7.19) 
and assuming n sufficiently large we can write 


An('S) — Eo [(1 + e-n (Ln — 1))'^] 

> Eo [(1 + en(Ln - 1))® l{e„(U-l)< 7 }] 

> Eo [(1 + Sen(L„ - 1)) l{e„(U-l)< 7 }] “ 

= 1 - a;(7)e^T)^ - Eo [(1 + se„ (L„ - 1)) l{e„(u-i)>7}] 

> 1 - Uj{'y)€lDl - Eo [(1 + Cn (U - 1)) l{e„(U-l)> 7 }] 

^2/1 1^2 


> 1 - w(7)e„T>„ - El 


-0 


> 1 - a;( 7 ) + ^E| 


2 ““O 

3 


T ^ ) '’n(En 1) l{e„(U-l)>7} 
(Ln- 1 )^ 


1)2 




^n-^n 


> 1 - a;( 7 ) + ^7 e„T)„, 


T 


where in the second equality we used the fact that Eo[L„ — 1 ] = 0 and 
in the third inequality we replaced the maximum values of s = 1. In the 
fourth inequality we used the property that on the set { L„ > 1 -f ^} we can 
write :^e^(Ln - 1)^ > 1 and ^eKln - 1)^ > en(U - !)• Finally in the last 
inequality using the condition of Theorem 3.1 and assuming n sufficiently 
large the expectation becomes smaller than 7 ^. 

Using the previous result we obtain 


lim inf 

n^oo 


log An ( 5 ^) 

g 2 7)2 


> lim inf 

n^oo 


log (1 - (a;( 7 ) + 2-f)elDl) 

^IDI 


-(a;( 7 ) + 27 ), 


where for the equality we used the limit g^g q 

assumption that enDn —)• 0. Letting 7 —)• 0 establishes the lemma since 
w( 0 ) = |. 

The proof is identical under Hi^„, where the analogue of the lemma is 
iimmi^^oo —3772— — ~8- ^ 
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8. Moderate Signals: Gaussian Location Model. Assume | > 
PfaH < (^Eo v^l - + enU(Xi) ^ 

el{Ln{X,) - 1)2 


^ Recall from the proof of Thm 3.1 


( 8 . 1 ) 

and 


(8.2) Eq \/l — En + ^n\-n{Xl) 


= l-:rEn 


_ (l + \/l + en(Ln(^l) — 1)) 


We write the observations as a multiple oi fin, X = a fin- Then, taking 
fin = y/2r logn, we have 

(8.3) U(x) = . 

In view of (8.3), 

(8.4) e„(U - 1) = 


Thus, if r(2a — 1) — /3 > 0 we have e,i(L„ — 1) —>■ oo and if r(2Q; — 1) — /? < 0 
we have en(L„ — 1) —)■ 0 as n —)• oo. 

Let ^ = J: + 5 - Note 1^ for x > —1. Thus, on the event 

{Ai < Kfln], 


el{K{X,) - 1)2 


> 


e2(U(Xi) - 1)2 63 (L„(Ai) - 1)3 


(l + \/l + en(Ln(^l) — 1)) ^ 

(8.5) > el{Ln{X,) - 1)2 

e 2 (U(Ai) - 1)2 


8 

1 (1 — n~^) 

4 8 


( 8 . 6 ) 


> 


where (8.5) follows from —1 < —< en(L„(Ai) — 1) < 1 — n ^ on {Xi < 
nfin}, and (8.6) follows from non-negativity of the terms involved. Then, 


62 (U(Ai) - 1)2 

(l + V^l + (-n{^n{Xl) — 1)) 


(8.7) 

> Eo 

e2(U(Ai) - 1)2 

_ (l + \/l + en(Ln(^l) — 1)) 

(8.8) 

>|Eo[(U(Ai)-1 )2 i^^^<,^J 

(8.9) 

= ^ 

8 

^e'""$((K-2)//n) -24 '((k- 




1 )//) + $(K/i)) 
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where (8.7) follows from non-negativity, (8.8) follows from (8.6) and de¬ 
notes the standard normal cumulative distribution function. 

By the standard approximation Q{x) « domi¬ 


nant term in (8.9) is e^e^"<h((K—2)//„)/8 and is of the order of 


^-2/3+2r-r{1.5-/3/2r) 
\/2r log n 


2 


which tends to zero by our assumption on Thus, as in the proof of Thm 
3.1, by (8.1) 


< log (l - ^ _ 2)^,n) - 2 H{k - l)^,) + cl>(K/r))) . 


Dividing both sides by e^e^"<h((K — 2)/r„) and taking a limsup yields 

logPFA(n) 1 

hm sup —-— 5 — -^-- < — —. 

n^oo ne^e^^^^{{K — 2)//„) 16 

For consistency, it suffices to require ne^e^"‘h((K — 2)^n) —^ oo- Thus, it 
suffices to require 1 — 2/3-|-2r — r^| — >0, since ^J\ogn is negligible 

with respect to any positive power of n. Combining the constraints —2/3 -|- 

2r - r (^1 - < 0,1 - 2/3 -h 2r - r (^1 - i > > 5 

desired rate characterization. 

The proof for Pmd is identical. Note that this bound is likely not tight 
(even if it has the right order), since we neglected the event {Xi > Kfj-n} to 
form the bound. 
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